Engineering in the AI Era: Insights from OpenAI
The Changing Role of the Engineer
[0:00:00] In the current engineering landscape, 95% of engineers are utilizing Codex, and 100% of pull requests are reviewed by Codex for the engineering team. It is difficult to identify a profession that has undergone more significant changes in the past few years than software engineering. Engineers are increasingly transitioning into roles similar to tech leads, where they manage extensive fleets of agents. This shift creates a sensation akin to being a wizard casting spells that go out and perform tasks autonomously.
The New Economics of Startups
[0:00:18] Many observers have yet to price in the second and third order effects of the one-person billion dollar startup. To enable such an entity, there might be a hundred other small startups building bespoke software. This trend suggests that the industry may be entering a golden age of B2B SaaS.
The 100% AI Codebase Experiment
[0:00:31] There is a growing stress associated with agents not functioning as intended. Within OpenAI, a team is currently conducting an experiment by maintaining a codebase that is 100% written by Codex. This team encounters the specific challenges inherent in agent-driven development. Unlike traditional environments where a developer might roll up their sleeves to manually fix an issue, this team operates without that standard escape hatch, forcing them to rely entirely on the AI's capabilities.
Strategy for Building in a Rapidly Changing Field
[0:00:47] In the field of AI, simply listening to customers is not always the most effective strategy. The models and the field itself are evolving so rapidly that they tend to disrupt existing systems. These models will effectively eat your scaffolding for breakfast. The core advice for those who do not want to miss the boat is to ensure they are building for where the models are going, rather than where they are today. Kevin Whale, the VP of Science at OpenAI, often remarks that the models of today are the worst they will ever be.
Guest Profile: Sherwin Wu
[0:01:11] Sherwin Wu serves as the Head of Engineering for OpenAI’s API and developer platform. Given that nearly every AI startup integrates with OpenAI’s APIs, Sherwin possesses a uniquely broad perspective on current industry trends and the future direction of technology.
Engineering Intelligence and Error Tracking
[0:01:30] To thrive in the AI era, organizations must adapt quickly. DX is a developer intelligence platform designed by leading researchers to help leaders understand which tools are working and how they drive value. Companies such as Dropbox, Booking.com, Adion, and Intercom use DX to gain insights into how AI impacts engineering productivity.
[0:02:14] Applications can break through crashes, slowdowns, and regressions. Sentry provides a connected view of these errors, identifying the specific commit and developer responsible down to the exact line of code. Seir, the AI debugging agent from Sentry, uses this context to identify root causes, suggest fixes, and open pull requests. It also reviews pull requests to flag potential breaking changes.
AI Integration at OpenAI
[0:03:15] Sherwin Wu still writes code occasionally, though he notes that for managers, it is now significantly easier to use AI tools than to code manually. At OpenAI, all code produced by engineering managers is currently written by Codex. There is a tangible internal energy regarding the advancement of these tools. While it is difficult to measure the exact percentage of code written by AI because nearly 100% is generated by AI first, the data shows that 95% of engineers use Codex on a daily basis and 100% of pull requests undergo daily review by the tool.
The Impact on Productivity and Workflow
Every piece of code that goes into production and is merged now has Codex overseeing it [0:04:23]. The tool actively suggests improvements and changes within Pull Requests (PRs). Internally at OpenAI, there is a tangible energy regarding the progress of these tools. Data shows that engineers who use Codex more frequently are opening 70% more PRs than those who do not [0:04:48]. This productivity gap is widening as engineers learn to use the tool more efficiently over time.
The Evolution of Trust
Currently, 95% of engineers at OpenAI have their code written by AI, which they then review [0:05:13]. While this shift has required a period of adjustment, the level of trust in the model continues to rise. Kevin Weil, the VP of Science at OpenAI, frequently notes that "this is the worst the models will ever be" [0:05:55]. This principle holds true for software engineering as well; as the models improve, engineers find themselves trusting the AI to handle increasingly complex tasks autonomously.
Practical Examples and Agent Interaction
Developers such as Peter, who works on Open Claw (previously known as Claudebot or moldbotclaw), have reached a point where they trust Codex almost implicitly [0:06:13]. He has shared that he often feels certain enough in the model's output to commit it directly to the master branch. Beyond individual coding tasks, the industry is seeing the emergence of AI agents that can communicate with one another. This shift is described as surreal, resembling the film Her manifesting in real life [0:06:51].
From Coder to Agent Manager
The role of the software engineer has changed more significantly in the past few years than perhaps any other profession. The job has transitioned from writing every line of code to managing and overseeing the output of AI [0:07:04]. Over the next 12 to 24 months, the industry will continue to figure out new standards for this role. Individual contributor engineers are effectively becoming tech leads, managing "fleets and fleets of agents" [0:08:04]. It is now common for engineers to manage 10 to 20 parallel threads or tasks simultaneously [0:08:12]. The primary task is no longer the manual act of writing code but rather steering the agents and providing iterative feedback.
The Evolution of Software Engineering: From IC to Technical Manager
[0:08:33] Looking toward the next one to two years, a specific metaphor comes to mind from a classic programming textbook titled SICP, or Structure and Interpretation of Computer Programs. At MIT, this was the introductory textbook for a long time and developed a significant cult following. It teaches programming through a dialect of Lisp called Scheme, introducing students to functional programming in a way that is often described as mind-opening.
Programming as a Discipline of Sorcery
[0:09:20] The beginning of the book describes programming as a discipline comparable to sorcery. It posits that software engineers are essentially wizards, and programming languages are incantations. By issuing these spells, engineers make the program perform specific tasks. The core challenge of the craft is determining exactly which incantation is required to achieve the desired outcome. Although this book was written in 1980, the metaphor has persisted and is now playing out more literally in the era of vibe coding.
The Sorcerer’s Apprentice and High-Leverage AI
[0:10:11] The current wave of AI represents the next stage of this evolution. With tools like Codex or Cursor, developers are literally using incantations by telling the system exactly what they want. However, this shift brings to mind the Sorcerer’s Apprentice from Fantasia. In that story, Mickey Mouse finds the sorcerer's hat and uses spells to automate his tasks, but the brooms eventually go out of control because he doesn't fully understand the power he is wielding. He even falls asleep while the brooms continue their work, leading to a flood that only the old sorcerer can clean up.
[0:11:07] This is an apt analogy for modern engineering. When an engineer manages 20 different threads simultaneously, there is immense leverage, but it requires significant seniority and skill to ensure the models do not go off the rails. You cannot simply ignore the process; you must steer it. For a proficient senior engineer, these tools make the work feel magical, casting spells and having software execute them at scale.
The Genie and the Monkey’s Paw
[0:11:57] Another useful framework for this interaction is that of a genie granting wishes. One must be extremely clear about what they are wishing for, as the genie follows instructions literally. If you aren't precise about the parameters, such as how "big" you want something to be, the results may vary. There is also the risk of a monkey's paw scenario, where you get exactly what you asked for, but with unforeseen and potentially negative side effects. Despite these risks, the metaphor of the wizard remains the defining image of the field, which is why SICP is still widely known as The Wizard Book.
[0:12:22] We have reached a point where the metaphor of the wizard book is a reality. There is, however, an emerging stress associated with managing these systems. When Codex agents are deployed, there is a constant need to monitor their progress. If an agent fails to perform, it can feel like a significant waste of time. [0:12:44] This is a pivotal moment in the technology because the models and tools are not yet perfect, and we are still learning the optimal ways to interact with AI agents to accomplish complex tasks.
The 100% Codex Codebase Experiment
[0:13:02] An internal team at OpenAI is currently running an experiment to maintain a codebase that is 100% written by Codex. While most developers might use AI to generate a draft and then manually rewrite or polish the code, this team has committed entirely to the AI-driven approach. [0:13:28] They face the unique challenge of having no escape hatch. If they cannot get the agent to build a specific feature, they are not permitted to step in and write the code themselves. They cannot rely on manual intervention or switch to tools like Cursor or standard tab completion to fix the issue. [0:13:49] This constraint forces a different kind of problem-solving: figuring out exactly how to guide the agent to the desired outcome.
Context as the Primary Bottleneck
[0:13:54] The learnings from this experiment, which will likely be detailed in an upcoming blog post, have highlighted several best practices. A frequent discovery is that when a coding agent fails, the root cause is almost always a lack of context. The instructions might be underspecified, or the agent may lack access to the necessary information required to complete the task. [0:14:26] To overcome this, the engineers must focus on adding documentation and encoding tribal knowledge directly into the repository. This involves improving code comments, refining the code structure, and creating Markdown files or other resource types that define specific skills. By enriching the environment with this information, the model can better understand and execute its responsibilities. This process of removing the manual escape hatch is essential for identifying the problems that must be solved to fully transition to an agent-based workflow.
Automated Code Review and Engineering Satisfaction
[0:15:08] The increase in development velocity has led to a surge in the number of Pull Requests (PRs) being submitted. This makes code review a massive challenge. [0:15:27] Currently, Codex is used to review 100% of the PRs for this team. One of the most interesting shifts is that the tasks handed to the models are typically the ones that humans find the most tedious or boring. This transition actually makes software engineering more fun, as developers are freed to focus on more creative and engaging work.
Overcoming the Manual Review Burden
[0:15:50] Historically, code review has been a significant pain point for many engineers. [0:15:56] During a first job at Quora, the experience of owning the newsfeed code illustrated this burden. Because the newsfeed was a central component that many other developers interacted with, the owner was required to review a constant stream of changes. Every morning started with a queue of 20 to 30 code reviews, a daunting and repetitive task that occupied a large portion of the workday. [0:16:17] Facing that volume of work every day required finding ways to get through it efficiently.
Efficient Code Reviews with Codex
[0:16:18] If you procrastinate on those daily reviews, the backlog can quickly grow to 50 or more tasks. At OpenAI, we have found that Codex is exceptionally good at reviewing code. Specifically, the model 52 has become extremely adept at this, especially when steered in the right direction. While we create a high volume of Pull Requests (PRs), Codex reviews all of them.
This automation transforms code review from a 10 to 15 minute task into one that often takes only 2 to 3 minutes, as the model provides a set of suggestions that are already baked into the PR. [0:16:52] For smaller PRs, we have reached a point where human review is sometimes unnecessary. We trust Codex in this capacity; the original author can simply review the model's findings. Since the primary benefit of a code review is having a second pair of eyes to prevent errors, Codex serves as a very capable and smart second pair of eyes.
Automating the Path to Production
[0:17:11] We have also heavily automated the general CI process and the post-push deployment workflow using Codex. Engineers are often most annoyed by the friction of getting code into production after it has been written—handling tests, lint errors, and various manual checks.
We have built internal tools to automate this process and collapse the workload. [0:17:36] For example, a lint error is a very easy fix for Codex. The model can patch the error and restart the CI process automatically. Our goal is to minimize the manual labor for engineers as much as possible, which allows them to merge and push out a much higher volume of PRs.
Model-on-Model Oversight
[0:17:53] Having Codex write code and then review its own output does introduce a circularity. To avoid a Sorcerer’s Apprentice scenario where the "brooms go crazy," we are very thoughtful about which PRs are reviewed solely by the model. Most people still personally inspect their PRs; the goal is not to reach zero human involvement, but to move from 100% attention to roughly 30% attention to help things move through the pipeline faster.
[0:18:31] We test a variety of models internally and prioritize dogfooding our own technology. While we use external models less frequently, we utilize different internal variants of our models to gain multiple perspectives, which has proven to be an effective strategy.
The State of AI-Authored Code at OpenAI
[0:18:51] To provide a barometer of the current state of AI and code at OpenAI, I would not state that 100% of the code running in production today was authored by AI, as attribution is difficult. However, almost every engineer heavily uses Codex for all of their tasks. [0:19:23] I would guesstimate that the vast majority of our code at this point was authored by the model.
The Changing Role of Engineering Managers
[0:19:30] While there is much discussion about the role of the individual contributor, there is less focus on the changing role of the engineering manager. [0:19:48] Thus far, the managerial role has changed less than the engineering role, as there is no specific version of Codex for managers yet. However, I use Codex for various managerial tasks. While the impact hasn't been as drastic yet, there are emerging trends that show where the role of the manager is going.
AI and the Supercharged Top Performer
[0:20:12] It is becoming increasingly clear that Codex and AI tools more broadly empower top performers to become significantly more productive. This is a trend that likely applies across society: individuals with high agency who lean into these tools and master them will essentially supercharge their own capabilities. As a result, there is a broadening spread in team productivity, where the most effective engineers are pulling even further ahead of the pack.
A Management Philosophy for the AI Era
[0:20:51] This shift reinforces a specific management philosophy: spending the majority of one's time with top performers. While the traditional instinct might be to focus on struggling team members, the goal in an AI-driven environment is to ensure that the highest-leverage individuals are unblocked, happy, productive, and heard. When top performers use these tools, they are effectively shooting ahead, and a manager's role is to facilitate that momentum.
[0:21:10] A prime example is the team currently maintaining a codebase that is 100% Codex-generated. By letting them lean into that process and observing the results, the organization sees dividends that wouldn't be possible under more restrictive oversight. This trend of managers prioritizing time with their most exceptional talent is likely to continue as AI tools become more integrated.
Increasing Management Leverage Through Internal Knowledge
[0:21:34] While there isn't a "Codex for managers" in the literal sense of writing code, tools like ChatGPT are becoming invaluable for managing organizational knowledge. By connecting AI to internal data sources—such as GitHub, Notion documents, and Google Docs—managers can conduct deep research and understand organizational context much faster.
[0:21:51] This is particularly useful during cycles like performance reviews. Instead of manually scouring a year's worth of contributions, a manager can use AI hooked up to internal systems to generate a comprehensive research report on what a person has accomplished over the last 12 months. This type of leverage allows managers to operate with a much higher degree of insight into their team's output.
Expanding Team Size and Scope
[0:22:08] Because of this increased leverage, it is probable that managers will be able to handle much larger teams than they have historically. In software engineering, the current best practice for a manager-to-report ratio is typically six to eight people. However, just as software engineers are now managing twenty to thirty "codebases" or agents via Codex, managers will use these tools to maintain a high level of context across much larger groups.
[0:22:30] This trend is already visible in non-engineering domains like support and operations. As more tasks are passed off to AI agents, a single person can oversee more work and manage more people. In tech companies, some engineering managers are already managing significantly larger teams quite adeptly because these tools allow them to understand their team's actions and the broader organizational context with less manual effort.
From Exceptional Individuals to Organizational Best Practices
[0:23:17] As Mark Andreessen recently noted, AI makes good people better and great people exceptional. Within an organization, there is often a core group of engineers who are truly leaning into Codex, thinking through the best practices for interacting with the model.
[0:23:44] From a management perspective, the highest-leverage move is to encourage this exploration. These "pioneers" identify the techniques that work, which are then shared across the entire organization through documents and knowledge-sharing sessions. This process ensures that the breakthroughs made by exceptional individuals eventually elevate the productivity of the entire team. [0:24:14] There is a widespread sense that AI is fundamentally changing the landscape of work.
Beyond the Hype: What People Aren’t Pricing In [0:24:20]
People generally sense that AI is a massive shift that will change the world, but there are specific elements of this trajectory that remain underappreciated. [0:24:26] One of the most fascinating concepts to emerge from this wave is the idea of the one-person billion-dollar startup.
The Rise of the One-Person Billion-Dollar Startup [0:24:33]
While Sam may have been the first to coin the phrase, the implication is profound. [0:24:45] If individuals become high-leverage enough, a single person could eventually build a company valued at $1 billion. This indicates a massive increase in personal agency; one individual using these tools can manage every necessary task to scale a business to an extreme valuation.
Second-Order Effects: The Startup Explosion [0:25:21]
Beyond the existence of a single massive entity, the broader second-order effect is that it becomes easier for everyone to create startups in general. [0:25:32] We are likely headed toward a significant boom in both startups and Small and Medium-sized Businesses (SMBs). When anyone can build software for any purpose, the landscape becomes much more vertical-oriented.
A New Golden Age of B2B SaaS [0:26:14]
We are already seeing AI startups lean into specific domains, deeply understanding use cases to create highly effective tools. [0:26:06] In an AI-driven future, there could be 100x more of these companies. To enable a single one-person billion-dollar startup, there might be a hundred other small startups building bespoke software to support it.
This suggests we are entering a Golden Age of B2B SaaS and general software development. [0:26:36] As the friction of building software and running a company evaporates, the volume of companies will explode. While there may be one $1 billion solo startup, there could be a hundred $100 million startups and tens of thousands of $10 million startups. For a high-agency individual, a $10 million business is life-changing, and we are likely to see an explosion of people achieving that level of success. [0:27:14]
Third-Order Effects: Shifting the VC Landscape [0:27:20]
Further out, the third-order effects introduce more uncertainty but suggest a fundamental shift in the startup and Venture Capital (VC) ecosystems. [0:27:30] If the world moves toward micro-companies where only one or two people own and work at the business, the traditional VC model will have to change.
We might end up with a handful of large players offering platforms that support a massive tail of smaller companies. [0:27:49] The pool of venture scale startups—those capable of returning 100x or 1000x on an investment—might actually shrink. While a multitude of $10 million to $50 million companies is incredible for the individuals running them, they do not provide the returns required for traditional venture capital. [0:28:05] This shift favors the high-agency individual who uses AI to build a self-sustaining, highly profitable business. [0:28:11]
The Limits of Prediction [0:28:18]
While the conversation explored deep structural changes, predicting fourth-order effects remains a challenge. Sherwin noted that thinking that far ahead is currently too speculative, as the profound shifts in agency and economics are still in their early stages. [0:28:21]
The Challenge of the Billion-Dollar One-Person Startup
[0:28:26] The complexity of these AI transitions is like the movie Inception, where every layer you go deeper causes everything to move slower. This brings up the much-discussed concept of the billion-dollar one-person startup. There is reason to be skeptical about this, largely due to the hidden costs of human interaction.
[0:28:36] Even if what you are doing is not venture-scale or high-leverage, the sheer volume of support tickets can be overwhelming. It is difficult to imagine a single person managing a billion-dollar company because of the support costs. Unless your average contract value is extremely high and you have very few customers, dealing with the most "ridiculous" support requests is nearly impossible to scale alone.
[0:29:06] People often have the ability to solve their own problems, yet they choose to email support anyway. This human element is incredibly hard to automate away completely. Scaling to a billion dollars without at least some help—perhaps from contractors, though it is debatable if that still counts as a one-person company—is a massive hurdle. AI will likely only take a founder so far in managing that burden.
A Network of Tailored Support
[0:29:32] However, there is an alternative view. It is possible that something like Lenny’s podcast could eventually become a billion-dollar startup. The mechanism for this wouldn't necessarily be one person using an AI to fix every support ticket personally. Instead, we might see a "smattering" of other small startups building software that is hyper-tailored to specific needs.
[0:30:05] We could see ten or twenty different startups that exist solely to build support software specifically for podcasts and newsletters. These support companies might themselves be one-person startups. Because the cost of writing software and building products is collapsing so rapidly, individuals can code these niche products very easily.
[0:30:30] In this world, the "billion-dollar founder" reduces the size of their company by outsourcing specialized tasks to other high-leverage, AI-enhanced tools. While there is high uncertainty about how this plays out, the end result could still be a single person driving a massive, high-leverage company that reaches a billion-dollar valuation.
The Reality of Distribution and Attention
[0:30:57] Look at the current experience of Peter at Clawbots (also known as moldbot or openclaw). He is currently being "barraged" by an endless stream of emails, pings, direct messages, and pull requests. The intensity is likely similar to the "craziness" that followed the launch of Hatchvt. Remarkably, he is managing all of this attention as one person while not even making money off the project yet.
[0:31:27] This leads to a potential "fourth order effect": distribution becomes increasingly important. Because there are so many things constantly trying to grab a user's attention, the value of people who already have an audience and a platform grows significantly. Having that established reach is "good stuff" in an era of infinite product creation.
Core Management Lessons from the AI Frontier
[0:31:42] Transitioning to the topic of leadership, managing the team that builds the platform powering the entire AI economy—where almost every AI startup is building on the OpenAI API—requires a specific approach. While the context of the work is cutting-edge, many core management lessons remain timeless.
[0:32:12] One successful principle is choosing to spend more time with top performers. Management philosophies often evolve, but certain core beliefs stay the same. In a high-stakes engineering environment, focusing your energy on the individuals who are driving the most value has proven to be a key to success.
The Engineer as Surgeon
To be very concrete, a core management principle involves spending more than 50% of your time with your top 10% performers. The goal is to empower them as much as possible. This approach stems from an analogy found in the book The Mythical Man-Month, which compares a software engineer to a surgeon.
Written in the 1970s, the book predicted that software engineering might move toward a model where one person does the core work while everyone else in the room exists solely to support them. In a surgery room, the nurse, the resident, and the fellow are there to provide the surgeon with whatever tools they need exactly when they need them, such as a scalpel or a specific machine.
While software engineering has turned out to be much more collaborative than that specific prediction, the analogy remains a powerful management tool. The objective is to make the engineer feel like a surgeon by ensuring they have everything required to do their work. It should feel as though they have an army of people supporting them and looking around corners, even when it is just the manager providing that support.
Unblocking Through Organizational Insight
Looking around corners and unblocking people from an organizational perspective is extremely useful, especially in the current AI economy. When engineers are cranking through code and submitting pull requests, the main bottlenecks tend to be organizational or process-oriented rather than technical.
The best-case scenario for a manager is to have the scalpel ready before the surgeon even asks for it. By anticipating needs and removing friction, a manager allows the team to maintain its momentum. This metaphor has remained a constant guide throughout years of managing engineering teams at OpenAI.
Using AI to Predict Team Blockers
There is significant potential for AI to assist in this role of looking around corners. While not yet implemented, one could theoretically hook ChatGPT up to company knowledge bases like Notion or Slack to identify active blockers.
By analyzing messages and documentation, the AI could determine what is currently slowing the team down and suggest ways for a manager to help. Even more interestingly, the AI could be used to anticipate second and third-order blockers that might affect an engineer or a team in the coming months. Using a model to predict these future obstacles allows for a much more proactive and predictive management style.
Data Dog and the Power of Product Analytics
[0:36:20] Product managers use Data Dog every day to connect product insights to issues such as bugs, UX friction, and business impact. It begins with product analytics where PMs can watch replays, review funnels, examine retention, and explore growth metrics. While other tools stop there, Data Dog goes further by helping diagnose the specific impact of funnel drop-offs and friction.
Once you know where to focus, experiments prove what works. This was evident at Airbnb, where the experimentation platform was critical for analyzing what worked and where things went wrong. The same team that built the experimentation system at Airbnb built EPO, which is now part of the Data Dog ecosystem.
Data Dog allows teams to go beyond numbers with session replay, enabling them to watch exactly how users interact with heat maps and scroll maps to understand behavior. These capabilities are powered by feature flags tied to real-time data, allowing for safe rollouts, precise targeting, and continuous learning. Data Dog is where product teams learn faster, fix smarter, and ship with confidence. Demos are available at datadog.com/lenny.
Challenges in AI Deployment and Negative ROI
[0:37:30] Many companies implementing AI platforms are actually finding a negative ROI on their deployments. While quantitative numbers are difficult to measure precisely, there is a general sentiment outside of the tech industry that AI is being forced upon users. This negative ROI is often a symptom of how these technologies are being deployed.
A major factor is the Silicon Valley bubble. Insiders in the tech world often forget that they live in a bubble where everyone is highly "AI-pilled" and follows every model release. Most people in the United States are not software engineers and are out of the loop regarding how to use this technology effectively.
The Gap Between Power Users and Basic Usage
[0:39:10] While power users on platforms like X lean into advanced techniques like skills, agents, and MCPs, the actual employees at many companies are often attempting the most basic tasks. They frequently have very little understanding of exactly how the technology works. They ask simple questions and are not yet pushing the boundaries of what the tools can do.
An ideal AI deployment setup requires a combination of top-down buy-in and bottoms-up adoption. Top-down buy-in comes from the C-suite and executive leadership who want the organization to become an AI-first company. They provide the support and purchase the necessary tools.
Bottoms-Up Adoption and the OpenAI Experience
[0:40:20] However, top-down support must be paired with bottoms-up adoption. This involves actual employees who are excited about the technology and willing to learn, evangelize, and share best practices within the organization. Internal knowledge sharing is critical for success.
This pattern was observed internally at OpenAI. While the company always intended to be AI-centric, the technology truly started taking off among staff with the introduction of codecs and related tools. These allowed actual employees to begin building and integrating AI into their own workflows.
[0:40:46] Applying AI to specific tasks is critical because every individual's work is unique. Software engineering is fundamentally different from finance, which is different from operations, which in turn is different from go-to-market and sales. There are many last mile intricacies of work that must be handled in a bottom-up fashion.
[0:41:08] Many AI deployments suffer because they lack bottom-up adoption. When an initiative is purely an executive mandate, it is often divorced from what the actual work really looks like. As a result, companies end up with a large workforce that does not understand the technology. Employees might feel pressured because AI use is part of their performance reviews, but they are unsure what to do and have no peers to learn from.
Establishing an Internal Tiger Team
[0:41:34] The recommendation for companies pushing AI adoption is to find or staff a full-time tiger team internally. This team can explore the full extent of AI capabilities, apply them to specific workflows, and handle knowledge sharing. They create excitement among those who want to use the technology. In the absence of such a team, it is extremely difficult for employees to pick up these new tools.
[0:41:58] These teams are often most effective when they are not solely led by software engineers. Many companies do not have software engineers at all, and even when they do, technical-adjacent people are often the ones who get most excited about these tools.
[0:42:25] This might be a support team member or an operations lead who does not code but is an Excel wizard. These individuals are technical enough to leverage the tools and are often the ones who really light up when using them. While software engineers understand the technology, they are expensive and difficult to find, making these other types of folks ideal for the team.
The Bottom-Up Strategy for Success
[0:43:02] The major anti-pattern in the industry is a top-down mandate where the CEO or executive team decides the company will go AI first and judges everyone based on productivity increases without creating a bottom-up movement. Without a team spreading the gospel across the organization, these top-down mandates rarely work.
[0:43:27] Instead, the best management philosophy is to find the high performers in AI adoption and empower them. They should be encouraged to build hackathons, hold seminars, and conduct knowledge sharing to create the internal seeds of excitement.
The Limitation of Customer Feedback in AI
[0:44:06] Another important perspective is that talking to customers and listening to their feedback is not always the right strategy in the AI field and can occasionally lead a company astray. While talking to customers is generally useful, the AI landscape and the models themselves are changing so quickly that they tend to disrupt themselves.
[0:44:40] This disruption is especially common in the tooling and scaffolding space. A quote from an article by Nicholas, the founder of a startup called Finol, highlights this dynamic where...
The Models Will Eat Your Scaffolding
[0:44:52] Nicholas shared the best practices he learned through building AI agents for financial services at his startup, FinTool. He utilized a phrase that has since become quite prominent in the industry: "the models will eat your scaffolding for breakfast."
[0:45:05] Looking back to 2022 when ChatGPT first launched, the models were still relatively raw. Developers built extensive product scaffolding around them, particularly in the developer space, to steer the model and force it to perform as desired. This era saw the rise of agent frameworks and vector stores, which were incredibly popular tools at the time.
[0:45:30] As the field has evolved, the models have changed and improved so much that they have literally begun eating that scaffolding. What was once a complex external logic is now being handled internally by the models themselves.
The Evolution of Context and Frameworks
[0:45:42] This phenomenon remains true today. Currently, skills files and file-based context management are the fashionable forms of scaffolding. However, it is easy to foresee a world where these are no longer useful because the model can manage all of that independently, or a new paradigm will emerge that makes file-based skills obsolete.
[0:46:06] We have already seen this play out with agent frameworks, which are now arguably less useful than they were previously. In 2023, many believed that vector stores would be the primary way to bring organizational context into models. The strategy was to embed every bit of a corpus and then perform complex vector searches to surface the right information at the right time.
[0:46:31] All of that effort was essentially scaffolding built to compensate for models that were not yet good enough. As models improve, a more effective approach is often to remove that logic and trust the model, providing it with basic tools for search instead. This does not strictly require a vector store; it can be as simple as files on a file system, like skills or agents.md, to steer the model. While vector stores still have their place, the ecosystem built around the assumption that they are the only necessary scaffolding has changed.
Avoiding the Local Maximum
[0:47:07] This ties back to the idea that you should not always listen to your customers. Because the field is moving so fast, many people are stuck in a local maximum. If you blindly follow customer feedback, they will ask for better vector stores or better versions of current agent frameworks.
[0:47:27] If you only chase those specific requests, you risk building for a local maximum that will soon be surpassed. As the models get better, developers must constantly reinvent and rethink the abstractions and tools they build around these systems.
[0:47:43] It is a moving target, which makes the work both exciting and challenging. The current set of tools and frameworks will likely need to evolve significantly as models get smarter. Building in this space requires balancing immediate feedback with a vision of where models will trend over the next one to two years.
The Bitter Lesson of AI Implementation
[0:48:11] This situation is a modern application of the "bitter lesson" in AI and machine learning. That lesson teaches that the less you overcomplicate or add manual logic to these systems, the better they can scale and grow. The goal is often to step back and give the system more compute power to get smarter on its own.
[0:48:32] There is now a version of the bitter lesson applied to building with AI. While developers try to architect complex structures around the models, the models eventually "eat" those structures away. Even the OpenAI API team has been guilty of this, occasionally building complex features only to find that the model's own progress rendered them unnecessary. [0:48:45]
The Bitter Lesson in API Development
The OpenAI API team has occasionally taken strategic turns that, in hindsight, may have been unnecessary. However, the models continue to improve regardless, and the team finds itself learning the bitter lesson every single day [0:48:52]. For those currently building on the API or developing agents, the most important strategy is to account for the rapid pace of this evolution.
Building for Future Capabilities
The core advice for developers is to build for where the models are going, not where they are today [0:49:12]. It is clearly a moving target, and the most successful startups are often those that design products around an ideal type of capability that might only be 80% of the way there with current technology.
Initially, such a product might only "kind of work" or be "almost there." However, as the models improve—perhaps transitioning from a version like 03 to something like 5.1 or 5.2—the product suddenly clicks and becomes incredible [0:49:44]. By building with model capability improvements in mind, you create an experience that is far superior to one designed under the assumption that technology is static. Because models are advancing so quickly, developers often do not have to wait very long for these breakthroughs [0:50:16].
Trends in Task Duration and Coherence
Over the next six to twelve months, one of the most significant trends will be the length of tasks these models can perform coherently [0:50:36]. The meter benchmark, which tracks software engineering tasks, provides a sobering look at this trend by plotting previous models to show the trajectory of progress.
Currently, frontier models can complete multi-hour software engineering tasks roughly 50% of the time. For tasks lasting just under an hour, the success rate is approximately 80% [0:51:02]. Most products today are still optimized for tasks that the model performs for only minutes at a time. Even advanced coding tools are generally designed for interactive use, often optimized for a maximum of 10-minute intervals [0:51:31]. While some power users push these tools to perform multi-hour tasks, that remains the exception.
The Future of Task Dispatching
Following the current trend, the next 12 to 18 months could see models that handle multi-hour tasks very coherently [0:51:46]. Eventually, we may reach a point where a model can handle a 6-hour or day-long task. This will change the types of products being built.
Instead of constant interaction, the workflow will shift toward "dispatching" the model to do things on its own for a while. The user's role will evolve into providing feedback rather than overseeing every step. You likely won't want a model to run wild for an entire day without any check-ins, so the product design will focus on how to give the model feedback effectively [0:52:05]. This shift will significantly expand the universe of what models can accomplish.
Multimodal Advancements in Audio
Another area of excitement for the next 12 to 18 months is the improvement of multimodal models, specifically regarding audio [0:52:18]. While models are already decent at audio, they are expected to get much better over the next six to twelve months, particularly with native multimodal speech-to-speech models.
There is also interesting work being done on new types of architectures for the multimodal audio side. Audio remains a hugely underrated domain in enterprise and business settings [0:52:51]. While coding is a frequent topic of conversation, the potential for audio-integrated business applications is immense and still largely untapped.
The Multi-Modal Shift to Audio
While much of the current conversation around artificial intelligence focuses on text-based interactions and coding, we are actually communicating through audio. A significant portion of the world's business is conducted via audio, and many services and operations are fundamentally handled through talking. This makes the audio domain a hugely underrated area for development [0:52:55].
The next 12 to 18 months are expected to be very exciting for this field. As native multi-modal models and speech-to-speech architectures improve, there will be even more significant unlocks for what can be achieved with audio models in a business setting [0:53:11]. In summary, the trajectory suggests that AI agents will run for longer durations, and audio/speech will become a bigger, more central part of the core experience, evolving into a first-party, native capability [0:53:17].
The Opportunity in Business Process Automation
Beyond the technical improvements in models, there is a massive opportunity in Business Process Automation (BPA). This perspective stems from the realization that those living in the Silicon Valley bubble are accustomed to a specific type of work—software engineering, product management, and building products—that is shaped very differently from the work running the broader economy [0:53:47].
Software engineering is fundamentally open-ended knowledge work. It is about exploration and building new features rather than repeating the exact same task over and over. While tools like Codeium excel here because they assist with these open-ended tasks, engineering is not a highly repeatable process. Similar characteristics apply to data science and strategic finance [0:54:22].
Repeatable Operations vs. Open-Ended Engineering
As you move further away from the core of the technology sector, many jobs consist primarily of repeatable business processes. These are repeatable operations that have been iterated on by managers and are governed by a Standard Operating Procedure (SOP). In these roles, the goal is typically to follow the procedure accurately rather than to deviate from it [0:54:58].
In software engineering, ingenuity often involves deviating and finding new ways to solve problems. However, a vast amount of work in the world involves running through established procedures. For instance, when calling a support line or a utility company, the staff are working through a specific set of processes and rules regarding what they can and cannot do for a customer [0:55:24].
High-Determinism Automation
The general category of automating these repeatable business processes is extremely underrated because it is so different from the typical focus in Silicon Valley. There is a profound opportunity to apply AI toward making these repeatable processes easier and more efficient. The goal is to achieve high determinism while ensuring the AI is fully integrated with business data, business decisions, and the various systems within an enterprise [0:55:51].
This area has a lot of work to be done and substantial opportunity, even if it is currently less discussed. It impacts the productivity of companies and the jobs of people who perform these repetitive, easily automated tasks. When discussing how AI will transform a large enterprise over the next 20 years, software engineering is only one part of the story. The transformation on the business process side might look even more radical, though it is unclear whether that impact will be larger or smaller than the impact on software in absolute terms [0:56:42].
The scale of the business process automation opportunity is truly massive. [0:57:12] It is arguably larger than many realize, certainly bigger than the discourse on platforms like X or Twitter would suggest. While software engineering is a significant part of the AI story, the potential for impact on the business process side is substantial and potentially even more transformative in the long run.
Navigating the OpenAI Ecosystem as a Startup
[0:57:25] A frequent concern for founders building on top of AI platforms is the risk of being overshadowed or displaced by OpenAI itself. The general philosophy for startups should be to avoid overthinking where major labs are headed. In practice, startups that fail usually do so because their product does not resonate with customers, not because a large lab like Google or OpenAI moved into their space.
Conversely, companies that focus on building something users genuinely love can thrive even in highly competitive categories. Cursor [0:58:24] serves as a primary example; they have achieved significant success in the coding space by creating a product that people find indispensable. The current opportunity space is so vast that the standard rules for venture capital have shifted. VCs are now frequently investing in multiple competing companies within the same niche because the potential value is unlike anything seen previously. If you build a product that a specific group of people truly loves, you can establish a massively valuable business.
OpenAI’s Identity as a Platform Company
[0:59:18] From a strategic standpoint, OpenAI fundamentally views itself as an ecosystem platform company. This focus has been reinforced from the top by Sam Altman and Greg Brockman. The API was the company's first product, and there is a deep-seated commitment to fostering and supporting this ecosystem rather than dismantling it.
This philosophy is reflected in several key operational decisions. [0:59:44] For instance, every model released within an OpenAI product eventually finds its way into the API. Even specialized models, such as those optimized for the Codex harness, are made available to API customers. The company prioritizes platform neutrality, which includes allowing access to models without blocking competitors. Recent initiatives, such as testing a Sign in with ChatGPT feature, further demonstrate the intent to build a supportive infrastructure for other developers.
The Rising Tide Philosophy
[1:00:20] The guiding principle is that a rising tide lifts all boats. While OpenAI has grown into a large organization—an aircraft carrier in this analogy—it remains in the company’s best interest to raise the overall tide of the AI industry. Everyone benefits from a larger, more robust ecosystem, and OpenAI’s own API growth is a direct result of this open approach. The commitment remains focused on providing an open ecosystem rather than pushing others out of the way.
Alignment with the OpenAI Charter
[1:01:00] This dedication to being a platform is not a recent shift; it has been the vision from the beginning and is rooted in the official OpenAI Charter. The mission is two-fold: first, to build AGI (Artificial General Intelligence), and second, to ensure its benefits are broadly distributed. [1:01:13]
OpenAI’s Mission and the Platform Strategy
[1:01:15] The core of the OpenAI mission is to spread the benefits of artificial intelligence to all of humanity. While products like ChatGPT aim for a global reach, the company recognized as early as 2020 with the launch of its API that it could not reach every corner of the world alone. To fulfill its charter, OpenAI operates as a platform that empowers others to build specialized applications, such as customer support bots tailored for podcasters and newsletter hosts. This diversity of use cases is seen as a direct expression of the mission, ensuring that the technology reaches niches and markets that OpenAI could not serve directly.
The ChatGPT App Store and Global Scale
[1:02:12] Beyond the API, OpenAI is expanding its ecosystem through the ChatGPT App Store. This project, managed by the ChatGPT team in collaboration with the platform team, utilizes a dedicated apps SDK. The store aims to leverage the massive audience of 800 million weekly active users, a figure that represents approximately 10% of the world's population and continues to grow rapidly. By allowing third-party companies to build for this audience, OpenAI intends to further accelerate the distribution of AI benefits and expand its user base.
Democratizing Access to Intelligence
[1:03:37] A fundamental aspect of the OpenAI mission is making high-level intelligence available regardless of economic barriers. Despite criticisms regarding the cost of premium tiers, the company maintains a powerful free version of ChatGPT. This ensures that the gap between the tools available to a billionaire and those available to an individual in a remote village remains minimal. The overarching goal is to raise the floor of intelligence available to everyone globally.
[1:04:12] This commitment to accessibility is particularly relevant in fields like healthcare and education. Since 2022, the capability of the free tier has increased dramatically. While the models available several years ago were impressive for their time, they are significantly less capable than the GPT-4o level models provided for free today. This creates a parity of access similar to the smartphone market; just as a billionaire uses the same iPhone as a typical consumer, a user paying $20 a month for a Plus subscription or $200 a month for a Pro model is utilizing the exact same AI technology as the world’s wealthiest individuals. Most people, including billionaires, typically rely on the standard Plus tier for their daily needs.
This democratization and the spreading of AI benefits across the entire world is deeply meaningful and drives a significant portion of the work being done [1:05:12].
Exploring the Developer API and Platform
For developers looking to build on the platform, the API offers several endpoints that allow for sampling from the models [1:05:38]. The most popular endpoint currently is the Responses API, which is specifically optimized for building long-running agents.
The Responses API: A Low-Level Primitive
At its most basic level, the Responses API functions by having the user provide text to the model. The model then works for a duration while the user can poll the system to see its progress, eventually receiving a response [1:06:02]. This is the lowest-level primitive available, and it is the most popular way to build on the platform because it is entirely unopinionated. It allows developers the freedom to build essentially whatever they want without being forced into a specific framework [1:06:22].
The Agents SDK and Swarm Orchestration
Above the base API layer, there are increasing levels of abstraction, starting with the Agents SDK [1:06:32]. This SDK has also become extremely popular as it allows developers to build more traditional agents that function in an infinite loop.
The Agents SDK provides the necessary framework and scaffolding to make agent development easier. It allows for the creation of guardrails and enables an agent to farm out subtasks to other agents. This effectively allows a developer to orchestrate a swarm of agents to handle complex tasks [1:07:04].
UI Components and Evaluation Tools
Further up the stack, there are tools designed for the meta-level task of deploying an agent. A product called Agent Kit, which includes widgets, provides a set of UI components [1:07:19]. These allow developers to quickly build beautiful user interfaces on top of the API or the Agents SDK, addressing the fact that many agent UIs share similar requirements.
The platform also includes evaluation products, such as the Eval API or the EVALS product [1:07:41]. This tool allows developers to test their models, agents, or workflows in a highly quantitative way to ensure they are functioning correctly. These various layers offer different levels of abstraction and opinionation, allowing a developer to use the entire stack for rapid building or drop down to the Responses API for complete control [1:08:01].
A Unique Era in Technology
Looking ahead, the next two to three years are expected to be some of the most fun and exciting times in the tech and startup world in a very long period [1:08:32]. It is a moment that should not be taken for granted.
Reflecting on the past decade, entering the workforce in 2014 led to a few good years followed by a period of five or six years where tech felt less exciting. However, the last three years have been an incredibly energizing period, and the next few years will likely be a continuation of this wave [1:08:59]. Eventually, this period of rapid innovation will play out and become more incremental, but for now, it is a time to explore and invent new things [1:09:12].
Engaging with the Technological Shift
[1:09:14] This era offers the chance to change the world and redefine how we work. The primary advice for anyone concerned about missing the boat is to simply engage with the technology. Leaning in and building tools on top of these models is a significant part of the story, but you do not need to be a software engineer to participate.
A major part of leaning in is using the tools yourself. Many jobs are going to change, so understanding the current limitations—knowing what these models can and cannot do—is essential. By doing this, you can watch the trend and see how they improve over time. It is about getting used to the technology and gaining familiarity instead of staying passive and letting it pass you by.
Navigating the Noise and Anxiety
[1:10:07] There is a lot of stress and anxiety surrounding the rapid pace of development. Many feel pressured to keep up with every new release, such as learning a new bot every week. While being chronically online on X and internal Slack channels allows for a high level of absorption, much of what circulates is noise. You do not need to have every single piece of information enter your mind.
Leaning into just one or two different tools and starting small is often more than enough. The combination of the frenetic pace of the industry and the nature of X as a platform creates an overwhelming news cycle. To truly engage, you can take simple steps like installing a client or playing with ChatGPT by connecting it to internal data sources like Notion, Slack, or GitHub. Seeing firsthand what it can and cannot do is the most effective way to stay informed without becoming overwhelmed.
Lightning Round: Essential Reading
[1:11:39] In a series of quick-fire questions, the first topic covers book recommendations. One highly recommended fiction book is There Is No Antimemetics Division by the author qntm. Originally shared on X, this science fiction work is described as super well-written and fascinating. It follows a government agency fighting entities that make people forget they exist. Although it has elements of sci-fi horror, it is also unintentionally hilarious in parts.
On the non-fiction side, two books concerning China and US-China relations have proven particularly eye-opening over the last year. The first is Breakneck by Dan Wang. A key takeaway from this book is the analogy that the United States is a lawyerly society, whereas China is an engineering society. Both structures have their respective pros and cons, but the observation that the US is largely run by lawyers is a striking point.
Apple, China, and Quick Reads
The second non-fiction recommendation is the Patrick McGee book regarding Apple and China [1:13:15]. As a self-described Apple fan with a desk full of their products, the speaker found it fascinating to learn about the company's relationship with China. The book contains significant inside information about Apple as a corporation and functions as a timely page-turner. Regarding the previously mentioned book on anti-memetics, it is roughly two hundred pages long and can be finished in just two days [1:13:44].
Media and the Appeal of Anime
Finding time for movies or television is difficult with two children and a demanding career [1:13:49]. However, the speaker is a significant fan of Japanese anime. Recently, they watched the first few episodes of the new season of Jujutsu Kaisen, which is season three [1:14:12]. Anime is particularly appealing because it often features novel and unique plots and universes that Western media tends to avoid [1:14:25].
Home Networking and the "Apple of Ubiquity"
A recent product discovery involves Ubiquiti routers and security cameras [1:14:42]. After needing to set up a home network and security, the speaker moved away from simpler setups to this ecosystem. Ubiquiti is described as the Apple of home networking due to its well-built hardware and exceptional software [1:15:02]. While the hardware is beautiful, the high-quality mobile app for managing the network is what makes the product experience stand out.
Using Ubiquiti for wireless routing requires Ethernet wiring throughout the house [1:15:24]. The security cameras are a highlight, supported by an incredible mobile app as well as dedicated apps for Apple TV and iPad to monitor live feeds [1:15:35]. Although the products are somewhat pricey, the experience is considered superior to alternatives like Eero [1:15:49].
Agency and the Price of Real Estate
A favorite life motto the speaker frequently returns to is: never feel sorry for yourself [1:15:53]. In both work and life, many things will happen, but it is vital to maintain a sense of agency and the ability to pull oneself up [1:16:14].
In a previous role at Open Door, the speaker worked on models to determine how much the company should pay for houses [1:16:24]. Several variables impacted home prices in unexpected ways. One significant factor is the proximity to high voltage power lines [1:16:49]. These giant lines often produce a buzzing sound, and because most buyers have families, they are wary of having children near them. This has a substantial impact on the valuation.
Another critical but difficult-to-quantify variable is the floor plan [1:17:13]. While obviously important to the value of a home, it remains a challenge to mathematically quantify what constitutes a good versus a really bad floor plan.
The Challenge of Quantifying Floor Plans
[1:17:28] Quantifying what makes a good or bad floor plan proved to be an immense challenge for the team at Opendoor. They attempted to analyze various data points to capture the essence of a layout, such as the width of the kitchen, the specific style of the kitchen, and the location of the master bedroom. Despite these efforts, floor plans remained difficult to translate into code.
[1:17:39] There were instances where a home simply would not sell, and the operations team would identify the culprit as a floor plan issue. This was often a matter of intuition; an expert could walk inside and immediately feel that the layout was problematic, even though that "feel" was incredibly hard to measure programmatically.
The High ROI of Curb Appeal and First Impressions
[1:17:52] Another factor that was significantly more impactful than initially expected was general curb appeal, specifically the front door. [1:18:01] A Zillow book on the subject suggests that front door replacement often yields the highest ROI for home improvements.
[1:18:08] The importance of the first few moments a buyer spends interacting with a house as they walk up should not be underrated. What the buyer sees and feels during those initial seconds of entering the house heavily influences their overall perception of the property.
Digitizing the Physical World at Opendoor
[1:18:19] A fascinating aspect of the early days at Opendoor was dealing with the lack of digitized information. [1:18:28] At the time, floor plan data was not widely available in digital formats. There were only a handful of people who possessed paper floor plans for homes in cities like Phoenix and Dallas, creating a unique set of challenges and stories as the team worked to bring that physical information into their systems.
Connecting with Sherwin Wu
[1:18:38] Sherwin Wu can be found online via X (formerly Twitter) under the handle Sherwin Wu. He primarily shares updates regarding OpenAI, the OpenAI API, and the various products the company is launching.
[1:18:56] Sherwin is particularly interested in hearing about what people are building. If you are working on a startup or hacking on an idea, he encourages reaching out to him on X to share your project and learn more about how OpenAI can support your work.
Resources and Outro
[1:19:17] To stay updated on future episodes, listeners can subscribe to the show on Apple Podcasts, Spotify, or their favorite podcast app. Leaving a rating or review is greatly appreciated as it helps other listeners find the program.
[1:19:31] All past episodes and additional information about the show can be found at lennispodcast.com.